Opinion Mining for Biomedical Text Data: Feature Space Design and Feature Selection
نویسندگان
چکیده
Unstructured text (e.g., journal articles) remains as the primary means for publishing biomedical research results. To extract and integrate knowledge from such data, text mining has been routinely applied. One important task is extracting relationships between bio-entities such as foods and diseases. Most existing studies however stop short of further analyzing the extracted relationships such as the polarity and the level of certainty at which the authors reported on a given relationship. The latter is termed as the relationship strength and marked at three levels— weak, medium and strong. We have previously reported a preliminary study on this issue [22], and here we detail our studies on constructing a novel feature space towards effectively predicting the polarity and strength of a relationship. Unlike previous work, four types of polarity instead of three are considered, namely, positive, negative, neutral and norelationship. Another contribution is that in addition to the commonly accepted lexicon-based features, we have identified a set of novel features that capture both the semantic and structural aspects of a relationship. Our intensive evaluations demonstrate that combining these new features with the lexicon-based ones can achieve the best accuracy for polarity prediction (~0.91). This however is not the case for strength prediction, where lexiconbased features alone are sufficient (~0.96).
منابع مشابه
A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram
Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...
متن کاملFeature extraction in opinion mining through Persian reviews
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010